-
Notifications
You must be signed in to change notification settings - Fork 121
feat:Reduce unnecessary LLM calls from the respective channels #242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat:Reduce unnecessary LLM calls from the respective channels #242
Conversation
…nce Tailwind CSS configuration for improved styling
…ist sections with improved styling and functionality
…onents with QuickLink for improved hash navigation, and update Waitlist component for type safety
…default behavior in handleSmoothScroll function in Navbar
…rrency control for message classification.
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis PR adds smart caching and pattern-based classification to reduce unnecessary LLM API calls, featuring TTL caches, concurrent request deduplication, and regex-based fast-path detection. Simultaneously, the landing page is redesigned using Material-UI and Framer Motion for improved visual consistency and interactivity. Changes
Sequence DiagramsequenceDiagram
participant Client as Message Source
participant Router as ClassificationRouter
participant Cache as Cache Layer
participant Patterns as Pattern Matcher
participant Dedup as In-Flight Dedup
participant LLM as LLM Service
Client->>Router: incoming message
Router->>Router: validate message length
Router->>Cache: normalize & lookup cache key
Cache-->>Router: cache hit (if exists)
Router-->>Client: return cached result
rect rgba(200, 150, 255, 0.3)
Note over Router,LLM: Cache miss path
Router->>Patterns: check simple patterns
Patterns-->>Router: match found?
alt Pattern matched (fast-path)
Router-->>Client: return pattern classification
else No pattern match
Router->>Dedup: check in-flight requests
alt Request already in-flight
Dedup-->>Router: wait for result future
Dedup->>Cache: store result on completion
Cache-->>Router: return stored result
else First request for this message
Router->>LLM: acquire semaphore & invoke
LLM-->>Router: parse JSON response
Router->>Cache: store result
Router->>Dedup: fulfill all waiting futures
Cache-->>Client: return cached result
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (11)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Closes #241
📝 Description
This PR fixes the unnecessary LLM API calls made by the user either in the Discord or GitHub channels by implementing a multi-layer caching and optimization system for message classification.
🔧 Changes Made
cache_helpers.py
MAX_MESSAGE_LENGTH(10KB): Truncates oversized messages before processing to prevent DoS attacks and excessive memory usage patches security issues_inflightdict withTTLCache(maxsize=1000, ttl=120)to prevent memory leaks from orphaned requestsasyncio.CancelledErrorhandling to prevent request hangs when tasks are cancelledclassification_router.py
asyncio.Semaphore(10)to limit concurrent LLM API calls, preventing rate limit errors and cost explosions during traffic spikesMAX_MESSAGE_LENGTHwith fallback classificationCost Optimization Summary
📷 Screenshots or Visual Changes (if applicable)
N/A — Backend-only changes with no visual impact.
🤝 Collaboration
Collaborated with: N/A
✅ Checklist
Summary by CodeRabbit
Release Notes
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.